Document Embeddings as a Means of Domain Adaptation for the Machine Translation
نویسنده
چکیده
منابع مشابه
Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings
We propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information. Our method first exploits word embeddings in order to efficiently evaluate trillions of candidate sentence pairs and then a classifier to find the most reliable ones. We report significant improvements in domain adaptation for statistical ...
متن کاملImage alignment via kernelized feature learning
Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...
متن کاملExtracting and Selecting Relevant Corpora for Domain Adaptation in MT
The paper presents scheme for doing Domain Adaptation for multiple domains simultaneously. The proposed method segments a large corpus into various parts using self-organizing maps (SOMs). After a SOM is drawn over the documents, an agglomerative clustering algorithm determines how many clusters the text collection comprised. This means that the clustering process is unsupervised, although choi...
متن کاملTranslation Technology Tools and Professional Translators’ Attitudes toward Them
Today technology is an integral part of professional translation; and it is generally assumed that translators’ attitudes toward translation technology tools influence their interaction with technology (Bundgaard, 2017). Therefore, the present two-phase study seeks to shed some light on what translation technology tools are and how professional translators feel toward them. The research method ...
متن کاملDomain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval
Statistical Machine Translation (SMT) is often used as a black-box in CLIR tasks. We propose an adaptation method for an SMT model relying on the monolingual statistics that can be extracted from the document collection (both source and target if available). We evaluate our approach on CLEF Domain Specific task (German-English and English-German) and show that very simple document collection st...
متن کامل